November 14, 2016

R Functions are objects

R is a functional programming language. This means that functions are "objects", just like data frames, vectors, and other things that are assigned to variables and passed to other functions.

A rose by any other name

The name of a functions is actually the name of a variable that contains the function, in the same way that the

log
## function (x, base = exp(1))  .Primitive("log")

This means that we can create a copy of a function by assigning its value to a new variable.

myLogFunction <- log
myLogFunction
## function (x, base = exp(1))  .Primitive("log")

Functions are a kind of data and have a class

myNumber <- 7
class(myNumber)
## [1] "numeric"
class(log)
## [1] "function"

Creating a function

We can create a new function using the word "function" followed by the functions arguments and one or more R statements.

myDumbFunction <- function() 42
myDumbFunction()
## [1] 42

Creating a multi-statement function

If there is more than one statement in a function, they should be enclosed in curly brackets:

doubleIt <- function(x) {
  myResult <- x * 2
  myResult
}
doubleIt(5)
## [1] 10

The last statement within the curly brackets will be the value returned by the function.

Functions live in their own little world

Inside a function, variables that existed in your environment can be used and even changed. However, any changes made, including changing data stored in variables and creating new variables, happens solely within the function. Your environment stays the same.

exists("myResult")
## [1] FALSE
myResult <- 1000
doubleItOutput <- doubleIt(2)
myResult
## [1] 1000

Example Data Set

The data set used in today's lecture comes from an siRNA screen that we published a few years ago. The screen looked for genes that influence parkin translocation.

High-content genome-wide RNAi screens identify regulators of parkin upstream of mitophagy. Hasson SA, Kane LA, Yamano K, Huang CH, Sliter DA, Buehler E, Wang C, Heman-Ackah SM, Hessa T, Guha R, Martin SE, Youle RJ. Nature. 2013.

The data set will be available for download from the lectures portion of the class web page.

Preview the Data

Import the data

ambion <- read.csv("nature.parkin.gw.ambion.csv", stringsAsFactors = FALSE)
str(ambion)
## 'data.frame':    65196 obs. of  25 variables:
##  $ Vendor.Supplied.Gene.Symbol                                            : chr  "LMAN1" "MED15" "PFN2" "KIAA1191" ...
##  $ Sample                                                                 : num  8.91 7.92 7.97 11.2 8.88 ...
##  $ Median.Negative.Control.on.Plate                                       : num  38.6 33.3 31.4 43.8 34.3 ...
##  $ Median.Positive.Control.on.Plate                                       : num  96.7 96.7 97.1 96.5 96.1 ...
##  $ PPT.Sample.as.Percentage.of.Negative.Control                           : num  23.1 23.7 25.4 25.6 25.9 ...
##  $ PPT.MAD.Z.Score                                                        : num  -2.42 -2.4 -2.36 -2.35 -2.35 ...
##  $ PPT.MAD.Log.MAD.Z.Score                                                : num  -4.82 -4.74 -4.54 -4.52 -4.49 ...
##  $ Median.PPT.Log.Mad.Z.Score.for.all.siRNAs.having.the.same.seed.sequence: num  -2.056 -1.421 -1.421 -1.388 -0.485 ...
##  $ Number.of.siRNAs.having.the.same.seed.sequence                         : int  23 152 152 28 18 28 152 28 75 18 ...
##  $ Cell.Count..Sample                                                     : int  1286 1203 1177 1060 1272 1093 1170 854 1220 1136 ...
##  $ Median.Negative.Control.Cell.Count.on.Plate                            : num  1278 1310 1260 1344 1256 ...
##  $ Median.Positive.Control.Cell.Count.on.Plate                            : num  1030 1050 978 1106 985 ...
##  $ Sample.Cell.Count..MAD.Z.Score.Normalized.to.Negative.Contol           : num  1.109 0.241 0.399 -1.027 1.168 ...
##  $ Sample.1                                                               : num  8.94 16.79 11.3 11.13 12.89 ...
##  $ Median.Negative.Control.Mitophagy.on.Plate                             : num  10.7 28.1 29 15.5 12.1 ...
##  $ Median.Positive.Control.Mitophagy.on.Plate                             : num  22.4 49 50.1 27.7 22 ...
##  $ Sample.Mitophagy..MAD.Z.Score.Normalized.to.Negative.Control           : num  0.0466 -0.8887 -1.7 -0.4281 0.9416 ...
##  $ HUGO.Gene.Symbol                                                       : chr  "LMAN1" "PCQAP" "PFN2" "KIAA1191" ...
##  $ Entrez.GeneID                                                          : int  3998 51586 5217 57179 284221 650689 3998 374986 146723 90378 ...
##  $ REFSEQ                                                                 : chr  "NM_005570" "NM_001003891" "NM_002628" "NM_001079684" ...
##  $ DESCRIPTION                                                            : chr  "lectin, mannose-binding, 1" "mediator complex subunit 15" "profilin 2" "KIAA1191" ...
##  $ PLATE.ID                                                               : chr  "QDA101771" "QDA101996" "QDA103288" "QDA103435" ...
##  $ Row.Number                                                             : int  1 15 14 16 1 2 1 16 10 1 ...
##  $ Column.Number                                                          : int  4 21 19 11 12 1 4 16 1 14 ...
##  $ Ambion.siRNA.ID                                                        : chr  "s8218" "s28366" "s10379" "s226909" ...

Check for missing data

apply(ambion, 2, function(x) sum(is.na(x)))
##                                             Vendor.Supplied.Gene.Symbol 
##                                                                       0 
##                                                                  Sample 
##                                                                     441 
##                                        Median.Negative.Control.on.Plate 
##                                                                     441 
##                                        Median.Positive.Control.on.Plate 
##                                                                     441 
##                            PPT.Sample.as.Percentage.of.Negative.Control 
##                                                                     441 
##                                                         PPT.MAD.Z.Score 
##                                                                     441 
##                                                 PPT.MAD.Log.MAD.Z.Score 
##                                                                     441 
## Median.PPT.Log.Mad.Z.Score.for.all.siRNAs.having.the.same.seed.sequence 
##                                                                     441 
##                          Number.of.siRNAs.having.the.same.seed.sequence 
##                                                                     441 
##                                                      Cell.Count..Sample 
##                                                                     441 
##                             Median.Negative.Control.Cell.Count.on.Plate 
##                                                                     441 
##                             Median.Positive.Control.Cell.Count.on.Plate 
##                                                                     441 
##            Sample.Cell.Count..MAD.Z.Score.Normalized.to.Negative.Contol 
##                                                                     441 
##                                                                Sample.1 
##                                                                     441 
##                              Median.Negative.Control.Mitophagy.on.Plate 
##                                                                     441 
##                              Median.Positive.Control.Mitophagy.on.Plate 
##                                                                     441 
##            Sample.Mitophagy..MAD.Z.Score.Normalized.to.Negative.Control 
##                                                                     441 
##                                                        HUGO.Gene.Symbol 
##                                                                       0 
##                                                           Entrez.GeneID 
##                                                                     441 
##                                                                  REFSEQ 
##                                                                       0 
##                                                             DESCRIPTION 
##                                                                       0 
##                                                                PLATE.ID 
##                                                                       0 
##                                                              Row.Number 
##                                                                     441 
##                                                           Column.Number 
##                                                                     441 
##                                                         Ambion.siRNA.ID 
##                                                                       0

Investigate Missing Data

ambion[is.na(ambion[,1]),][1,]
##    Vendor.Supplied.Gene.Symbol Sample Median.Negative.Control.on.Plate
## NA                        <NA>     NA                               NA
##    Median.Positive.Control.on.Plate
## NA                               NA
##    PPT.Sample.as.Percentage.of.Negative.Control PPT.MAD.Z.Score
## NA                                           NA              NA
##    PPT.MAD.Log.MAD.Z.Score
## NA                      NA
##    Median.PPT.Log.Mad.Z.Score.for.all.siRNAs.having.the.same.seed.sequence
## NA                                                                      NA
##    Number.of.siRNAs.having.the.same.seed.sequence Cell.Count..Sample
## NA                                             NA                 NA
##    Median.Negative.Control.Cell.Count.on.Plate
## NA                                          NA
##    Median.Positive.Control.Cell.Count.on.Plate
## NA                                          NA
##    Sample.Cell.Count..MAD.Z.Score.Normalized.to.Negative.Contol Sample.1
## NA                                                           NA       NA
##    Median.Negative.Control.Mitophagy.on.Plate
## NA                                         NA
##    Median.Positive.Control.Mitophagy.on.Plate
## NA                                         NA
##    Sample.Mitophagy..MAD.Z.Score.Normalized.to.Negative.Control
## NA                                                           NA
##    HUGO.Gene.Symbol Entrez.GeneID REFSEQ DESCRIPTION PLATE.ID Row.Number
## NA             <NA>            NA   <NA>        <NA>     <NA>         NA
##    Column.Number Ambion.siRNA.ID
## NA            NA            <NA>

Eliminate Missing Data

ambion <- ambion[! is.na(ambion[,2]), ]
apply(ambion, 2, function(x) sum(is.na(x)))
##                                             Vendor.Supplied.Gene.Symbol 
##                                                                       0 
##                                                                  Sample 
##                                                                       0 
##                                        Median.Negative.Control.on.Plate 
##                                                                       0 
##                                        Median.Positive.Control.on.Plate 
##                                                                       0 
##                            PPT.Sample.as.Percentage.of.Negative.Control 
##                                                                       0 
##                                                         PPT.MAD.Z.Score 
##                                                                       0 
##                                                 PPT.MAD.Log.MAD.Z.Score 
##                                                                       0 
## Median.PPT.Log.Mad.Z.Score.for.all.siRNAs.having.the.same.seed.sequence 
##                                                                       0 
##                          Number.of.siRNAs.having.the.same.seed.sequence 
##                                                                       0 
##                                                      Cell.Count..Sample 
##                                                                       0 
##                             Median.Negative.Control.Cell.Count.on.Plate 
##                                                                       0 
##                             Median.Positive.Control.Cell.Count.on.Plate 
##                                                                       0 
##            Sample.Cell.Count..MAD.Z.Score.Normalized.to.Negative.Contol 
##                                                                       0 
##                                                                Sample.1 
##                                                                       0 
##                              Median.Negative.Control.Mitophagy.on.Plate 
##                                                                       0 
##                              Median.Positive.Control.Mitophagy.on.Plate 
##                                                                       0 
##            Sample.Mitophagy..MAD.Z.Score.Normalized.to.Negative.Control 
##                                                                       0 
##                                                        HUGO.Gene.Symbol 
##                                                                       0 
##                                                           Entrez.GeneID 
##                                                                       0 
##                                                                  REFSEQ 
##                                                                       0 
##                                                             DESCRIPTION 
##                                                                       0 
##                                                                PLATE.ID 
##                                                                       0 
##                                                              Row.Number 
##                                                                       0 
##                                                           Column.Number 
##                                                                       0 
##                                                         Ambion.siRNA.ID 
##                                                                       0

Simplify the Data

Often it will be helpful to create a new data frame with only the data we wish to analyze.

ambion.simple <- ambion[,c(19,25,1,21,7,13,17)]
ambion.simple[1,]
##   Entrez.GeneID Ambion.siRNA.ID Vendor.Supplied.Gene.Symbol
## 1          3998           s8218                       LMAN1
##                  DESCRIPTION PPT.MAD.Log.MAD.Z.Score
## 1 lectin, mannose-binding, 1                -4.82223
##   Sample.Cell.Count..MAD.Z.Score.Normalized.to.Negative.Contol
## 1                                                     1.108518
##   Sample.Mitophagy..MAD.Z.Score.Normalized.to.Negative.Control
## 1                                                   0.04662911

Simplify our Column Names

library(knitr, quietly = TRUE)
colnames(ambion.simple) <- c("GeneID","siRNA","Symbol","Description",
                             "PPT","Cells","Mitophagy")
kable(head(ambion.simple, n=4), format = "markdown")
GeneID siRNA Symbol Description PPT Cells Mitophagy
3998 s8218 LMAN1 lectin, mannose-binding, 1 -4.822230 1.1085178 0.0466291
51586 s28366 MED15 mediator complex subunit 15 -4.739415 0.2405872 -0.8887362
5217 s10379 PFN2 profilin 2 -4.539309 0.3994161 -1.7000123
57179 s226909 KIAA1191 KIAA1191 -4.516443 -1.0274124 -0.4280631

Evaluate how our variables interact

library(ggplot2, quietly = TRUE)
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_point(alpha=0.5)

Evaluate how our variables interact

ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_point(alpha=0.5) +
    geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
               color="blue")

Refine the plot

ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
    geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
               color="blue")

Further refine the plot

ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
    geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
               color="red", shape=17, size =5) +
        ggtitle("PARK2")

Adding gene description

description <- ambion.simple$Description[ambion.simple$Symbol == "PARK2"][1]
description
## [1] "Parkinson disease (autosomal recessive, juvenile) 2, parkin"
myTitle <- paste("PARK2",description,sep=": ")
myTitle
## [1] "PARK2: Parkinson disease (autosomal recessive, juvenile) 2, parkin"

Final version of plot

ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
    geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
               color="red", shape=17, size =5) +
        ggtitle(myTitle)

Making the refined plot into a function

Now that we have our custom plot looking right, we would like to be able to do the same for other genes but without so much typing. First, make a new R Script in RStudio:

Constructing a new function from your history

Frequently, making a function will simply be a function of selecting the right parts of your history and hitting the "to source" button.

Function with PARK2 hard coded

graphGene <- function(gene) {
  description <- ambion.simple$Description[ambion.simple$Symbol == "PARK2"][1]
  myTitle <- paste("PARK2",description,sep=": ")
  ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
    geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
               color="red", shape=17, size =5) +
    ggtitle(myTitle)
}

Function made generic

graphGene <- function(gene) {
  description <- ambion.simple$Description[ambion.simple$Symbol == gene][1]
  myTitle <- paste(gene,description,sep=": ")
  ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
    geom_point(data = ambion.simple[ambion.simple$Symbol == gene,],
               color="red", shape=17, size =5) +
    ggtitle(myTitle)
}

Our function in action

graphGene("PINK1")

Default values for function arguments

pdfGene <- function(gene, file=paste(gene,".pdf", sep="")) {
    pdf(file, width=5, height=5)
    graphGene(gene)
    dev.off()
}

Passing on extra arguments to our function

We can use the ellipse notation (…) to indicate that extra arguments to our function should be passed on to a function that is inside our function (in this case pdf).

pdfGene <- function(gene, file=paste(gene,".pdf", sep=""), ...) {
    pdf(file, ...)
    graphGene(gene)
    dev.off()
}
pdfGene("PINK1", width=10, height=10)
## quartz_off_screen 
##                 2

Control of Flow: If/Else

We can decide whether something happens in our function using "if" and "if/else".

sillyFunction <- function(x) {
  if (x < 5) {
    returnValue <- x
  }
  else {
    returnValue <- x / 2
  }
  returnValue
}
sillyFunction(12)
## [1] 6

Control of Flow: For

pdfGenes <- function(genes, file=paste(gene,".pdf", sep=""), ...) {
    pdf(file, ...)
    for (gene in genes) {
      graphGene(gene)
    }
    dev.off()
}
pdfGenes(c("PLK1","PINK1","BRCA1"))